Hello everyone.
Today's topic is
Triton and Symbiotic Execution on GDB.
My name is Weibo Chen.
My common ID on the internet is
BananaAppleTW.
I recently graduated from
the University of California.
Before, at school,
I used to use
the social media of security.
It's called
NCTU-CSC.
It's the Internet Security Association.
I used to use CTF
with the name BambooFast.
I'm good at
Symbiotic Execution
and Binary Exploitation.
I talked about it
in Hikang 2015 and 2017.
This is today's outline.
I will introduce
Symbiotic Execution
and Triton
as a framework.
I will talk about
how I created
the tool SymbGDB
and summarize
Triton's shortcomings
and compare it
with other frameworks.
Why Symbiotic Execution?
In the past,
we used
static analysis
and dynamic analysis
to adjust our program.
Static analysis is like
OVJ dump.
You can look at
the content of the segment
or IDA pool.
It's an anti-independence tool.
Dynamic analysis
is like
GDB,
Ltrace,
Strace.
GDB is
GNU debugger.
Ltrace is a Linux tool
that can help you trace
library code.
It's a new tool
that can help you trace
system code.
You may think
it's quite troublesome.
Every time you use these tools
to look at it,
your head is about to explode.
Symbiotic Execution
is a convenient way
to automatically analyze
the content
and what inputs you need
to get to that point.
What is Symbiotic Execution?
It's a tool
that helps you find
the input
that will go to
a certain part of the program.
The most commonly used
framework is
S2E.
The system level
is S2E.
The system level
includes kernel space
and user space.
For example,
the framework for user level
is S2E.
But it's more special
because it needs
your source code
to make it into IR
to execute.
Let's look at an example.
For example,
this is function F.
Y is our input.
Y reads in from Z.
Z equals Y times 2.
Next, it will make a judgment
to see if Z equals 12.
If it's 4,
the method of Symbiotic Execution
is that when it encounters
this constraint,
it will help us to solve
for example,
if I want to go to fail state,
it will help us to solve
Y equals 6.
For example,
if we want to go to OK state,
it will find a Y
that is not equal to 6 for us.
Next, I will introduce
Triton.
Triton is a framework
that is developed by
Jonathan Sawin.
It's an open source
project.
Although it's mainly
developed,
some users will write code
and pull requests
to help develop it.
It has Python binding.
Its components
include Symbiotic Execution Engine,
Tracer,
AST expression,
and SNT solver interface.
I will introduce its outline.
Its structure is that
you will have some Tracers
such as Pin,
or Valkyrie,
or Cumul,
or you can store your commands
in DB,
like MySQL,
and take it out.
After you throw it to Triton,
it will have some components
to help you specify
which part of the variable
is Symbiotic.
SNT solver
to help you solve it.
Let me introduce
Symbiotic Execution Engine.
Symbiotic Execution Engine
is an engine
to store your Symbiotic State.
It stores three things.
It stores
your Symbiotic Registry State,
Symbiotic Memory State,
and your
reference relationship with them.
For example,
when we run initial,
ES is on set,
which means we haven't executed any commands yet.
When we run to the first one,
move ES is 0.
Now ES is 0.
ES becomes reference 1.
Next,
we execute the second command,
increment ES,
which is to add 1 to ES.
Now ES value becomes
phi 2,
which is reference 2.
It will come from the previous
command.
Next, the third command,
for example,
add ES 5.
Then we add reference 5.
Now ES reference is phi 3.
It will be equal to
phi 2 plus 5.
What conditions should
Tracer meet?
It needs to meet two conditions.
For example,
I need to know what my
OPCode execution is.
For example,
when I'm executing,
I need to know
my register and memory state.
After Tracer knows these two,
it will turn your control flow
into an AST expression.
Most of the time,
you can see in Tracer's
project
that it has a lot of
PIM Tracer examples.
PIM is an
instrumentation tool
developed by
Intel.
What is an AST expression?
Tracer actually supports
two structures,
x86 and x864 commands.
It will turn these semantics
into AST expressions.
Tracer's expressions
are all in SSA form.
Let's look at an example.
For example, our instruction
is add RAS RDS.
This command means
I add RAS and RDS
and then
save it to RAS.
So,
reference 41 is
the last reference of RAS.
Reference 40 is
the reference of RAS
when we are executing this command.
Reference 39 is
the reference of RDS
when we are executing this command.
What I'm talking about
with Tracer 630
is that
I take it out from
bit 0 to bit 63
because RAS is 8 bytes.
So, it's 64 bits.
After I take it out,
I add two bits
and it equals 41 references.
We can see
after we pass through these five commands,
your AST expression
will look like a tree.
The first command is
AS equals 1.
Next, CL. AL equals 1.
CL equals 10. DL equals 20.
These three on the left,
middle, and right.
These three are the values
and the right is the bit number.
Then we have SO, CL, and DL
which are the numbers on the right.
These are
S, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B.
Finally, we add AL and CL
and we get
the bit vector at the top.
What is SAF?
I mentioned that
Triton's ASD expression
is in SSAF.
SAF is
Static Single Assignment Fon.
will only be assigned once.
So for example, we have y equals 1,
and y equals 2, and s equals y.
Then the expression method of SAAFON
will become y1 equals 1, y2 equals 2,
and s1 equals y2.
Because when it passes an assign,
it becomes another state.
It has nothing to do with the previous state.
So after this,
we can set up some unnecessary calculations.
For example, y1 here
has already been deleted by us.
Next, I will introduce the concept of Symbolic Variable.
We can imagine that Symbolic is like an infectious disease.
When a command's calculator is Symbolic,
the memory or register affected by this command
will become Symbolic.
Okay.
In Twitter,
we can use these two messages to control it.
The first message is Convert Register to Symbolic,
which is the Symbolic Variable.
For example, if we want ECS,
and ECS is Symbolic,
we can put ECS in this function,
and it will mark it as Symbolic.
Next, we can use IsRegister Symbolize
to check if this is Symbolic.
Let's look at an example.
For example,
let's take ECS as Symbolic Variable.
The code on the right is a Twitter code,
and the code on the left is a text description.
So we can convert it to Symbolic,
and then we can confirm that it is Symbolic,
and the return will be True.
Next, we can use Test ECS ECS.
Test is a command that you will use before you jump.
It will first confirm the relationship between the two Registers,
and then set the flag.
The flag is actually one of the registers in the CPU.
It will be the flag.
So this means that its real flag will be set up.
If ECS and ECS end equal to 0,
it will be set up.
So if ECS is 0,
the real flag will be set to 1.
Otherwise, the real flag will be set to 0.
But this is not important.
Just remember that
after we pass the test,
ECS was originally Symbolic.
Now, because we passed the test,
the EIP after that will be affected by ECS.
Because it will decide for ECS
where the next register will go.
So EIP will also become Symbolic.
So if real flag equals to 1,
it will jump to the fifth register.
Otherwise, it will go directly to the next register,
which is without jump.
OK.
So,
the way we usually use it is that
we go and choose...
There was that ASTTree just now.
So each node is actually a register.
So we can choose the node we want.
For example, if I am interested in EAS now,
then I will choose EAS.
And then add some of my own settings to the control chain.
Then,
these two control chains and the AND
will be thrown to the SNTSolver.
Then it will help me to explain
whether it is in line with the Symbolic variable.
Finally, I can get the answer.
Now,
let's see how Triton is used on the CTF table.
This example is based on
the 2015 R100 of DevCamp.
That is,
this program will ask you to enter a password.
This password can be up to 255.
Next,
it will do a series of verification
to see if this password is right or wrong.
This is...
Its code is a little bit small.
That is,
we will use fget to read it in first.
Then,
we will throw the function sub4006fd
to determine whether it is right or wrong.
If it returns 0,
it will be correct.
If it returns 0, it will be nice.
So,
we will return 0 in the next function.
In fact,
it will return 1 in the middle.
So, we can't let it happen.
It has to go through 12 rounds.
As long as one of them returns 1,
we will fail.
So,
if it is not equal to 1,
it will return 1.
So,
we have to make it equal to 1.
That is,
if the judgment is equal to 1.
Okay.
Then,
we will use treaten to solve this problem.
Okay.
We will first import treaten.
Treaten context.
We will initialize it first.
Then,
we will set its structure.
That is,
it is x8664.
Next,
we will load the binary.
That is,
we will...
we will use leaf,
this Python module.
We will use its binary header
to...
to...
and its offset
to load it into memory.
Then,
we will use treaten set concrete memory
to set it in treaten's engine.
Then,
we will assume...
that is,
we will set the flag randomly.
No,
we will set the stake randomly.
Then,
we will set two words randomly.
That is,
we will set eBP and eSP.
That is,
64 is RSP and RBP.
That is,
stake base address
and stake address.
Then,
we will define user input.
Then,
in 64,
it...
its parameter is
using RDI,
this register.
Then,
we...
because we don't know
how long it is,
we will set it first.
We will assume that
it has 30 bytes.
That is,
its password length.
So,
we will use this memory
and then,
30 bytes
is its length.
We will set it
as symbolic input.
Then,
we will start executing.
That is,
we will jump directly to
the function to execute.
Then,
when it is executing,
in fact,
because we just
based on binary header,
we have already
read the segment,
that is,
test segment,
that is,
the segment that the program can execute.
The segment has been read in.
So,
we will use gate.
We will directly take the paper from memory.
Then,
after taking the paper,
we will make it into an instruction.
Then,
after the process,
this process,
then it will
update the paper
while running.
Then,
it will update
the counter in it
Then,
PC equals
NET,
which is
the paper of its RIP.
That is,
after it runs,
there will be a paper in your RIP.
It will continue to execute
like this.
Then,
it will not
break.
This is a
loop,
like this.
Then,
we just talked about
we have to equal to 1.
That is,
because it has to equal to 1
in the if.
So,
we stopped at the point just now
to judge whether it is equal to 1.
That is,
the position of 4078B,
it will compare
whether EAS is equal to 1.
So, after we stopped here,
we started to design the counter.
That is to say,
EAS is equal to 1.
And,
from just now to now,
it collects the counter.
After the two are combined,
then we throw it to the SNT solver,
which is the gate model.
This is Chiton's gate model.
Then,
it will explain
what the symbol is.
Then,
the result will be like this.
That is to say,
it will slowly
take turns
to
explain our password.
Okay.
Then,
in fact,
you can feel that
this whole process
is actually quite troublesome.
That is to say,
you need to
know a lot of concepts
before you can use
this tool.
So,
that is to say,
if you have,
if you write something yourself,
you write something
like a debugger,
and then,
and then,
if you have a debugger
to help you provide some paper,
a lot of it
can be omitted.
So,
in fact,
this is today's topic.
What I want to talk about is
this is a tool
I wrote myself
called
SYNGDB.
It is a
symbolic execution
plug-in.
Then,
it is a symbolic execution plug-in
of GDB.
It is mainly
made up of two things.
One is
the Chiton we just talked about.
The other is
the GDB Python API.
Then,
there is also
some symbolic environment.
That is to say,
it can symbolize
ARGB.
Then,
this is probably
the outline.
Then,
let us first introduce
what is
the GDB Python API.
In fact,
the GDB
Python API
has very few files.
This should be
the best one
you can find.
Then,
if you want to use
the GDB Python API,
you need to
create a file
called
gdbinit.
Then,
you can
source
Python commands.
What it can do
is that
you can register
the commands
written by
or
register
event handler.
Then,
you can
execute
some
GDB commands
and get
output.
Then,
you can
come back.
For example,
you want to
register
all the
words
of the register.
Then,
you can execute
info register
all.
Then,
you can cut out
what you want
and throw it
to your program
to execute.
Or,
you can
rewrite
or search
memory.
This is
a code
in my project.
I registered
a command
called
treaten.
So,
you must create
a class
called
treaten.
When you
type
the command
treaten,
it will
call
invoke
this
message.
I wrote
symbolic
class
run
message.
It will execute.
Then,
you
initialize
treaten
class.
It will
add
a command
treaten.
Then,
we can
register
the command
For example,
if the
command
treaten
is
not
in
the
point,
I will
reset
all
the
status.
Like,
gdb
and
arch.
Or,
we can
get
the
stack
star
address.
So,
where
does
the stack
start?
We write
info
process
all
and
use
regular
expression
to
cut
the
part
want.
Or,
read
memory.
I wrote
function
game
memory.
It will
eat
two
parameters.
One is
address,
and
the other is
mainly
used
because
it is
a
simple
execution
of
parking.
So,
when
I
solve
the
memory
value,
I can
write it
back to
gdb.
Then,
it can
continue to execute
and reach
the path we want.
In fact,
at the beginning,
I tried
some methods,
but I wrote
it wrong.
But I think
it's worth
talking about,
so I'll
share it with you.
At the beginning,
when I first
started
this project,
it used
pin
support.
So,
it usually
uses
the
pin
tool.
The way
it uses
it is
that
it will
register
two
callbacks.
The first
callback is
need
concrete
memory
value.
The second
is
need
concrete
register
value.
When it
needs
a
memory
or
a
register
value,
it will
call this
callback.
In this
callback,
you have to
write that
you have to
find a way
to get
your
debugger
or
from
your
tracer
to
set
to
the
context
content
so that
it can
execute
this
command correctly.
For example,
let's look at this.
For example,
move es
is 5.
Then,
move ebs
is 1.
The second
ebs
is 1
will trigger
your callback
because it
needs
the
e s
value.
So,
we need
the e s
value now.
But,
in fact,
this is
not used
in gb.
Because
we can think
that
Triton
has.
So,
we move
ebs
es.
Now,
Triton
will take
gdb.
Because
es
is 5,
gdb
has executed
it.
So,
it is
correct
to
get
the
callback
from gdb.
So,
we get
5
from gdb
it did
not
run
Triton.
In fact,
this is
a way
to
solve
this problem.
For example,
if
I
first
check
if
Triton
has
this
value,
if
there is
this value,
I will
take it from Triton.
If not,
I will take it
from gdb.
Then,
it should be
correct.
But,
in fact,
after test,
it will
enter
an infinite loop.
Because,
for example,
I keep
asking Triton
for the
Triton
will always
trigger
the callback.
Then,
the callback
will
keep
following Triton.
It will
enter
a loop.
So,
it is useless.
Later,
I thought of a way
to
copy all
the gdb
contents
into
Triton.
It is better
because
Triton
won't
affect
gdb
because
it doesn't
need
to run
with it.
Then,
we
can
keep
running
without
a port.
It won't affect
gdb.
So,
our
for
is more like
we get
the state
of gdb
Then,
we
use
gdb
Python API
to get
the state.
Then,
we
give
this
to Triton.
Then,
we set
the
value
so that
we can
target
edge
of
the
다고
level
to
ensure
that
Triton
is
across
the
gdb
database
as
Because I divided it into three classes when I was working on it.
The first class is Arch.
Because Triton supports x86 and x864,
so when you use two different structures,
the size of your pointer and the name of your register will be different.
So that's why we have Arch.
And GDB Util means that I write some convenient functions
that allow me to get to the state of GDB.
With Symbiotic, I will call Triton
to get my answer,
to get the answer,
and give it a set of controls.
That's about it.
Some of the instructions I designed are
Symbolize can mark a certain memory or ARGB
as a symbol.
Target is to specify a location I want to go to.
Then I can set this location.
Triton is to execute Symbolic Execution directly.
Answer and Debug are more like the instructions I use when I'm doing my homework.
So at the beginning, I called info.register
to get all the registers,
and readMemory to get its segment.
Then I set these with set.register.
Set the position of Triton.
Then I will call isRegister.Symbolize
to confirm if I can get the PC address or not.
If I can get it,
I'll try to find the input on which address I need.
Then I set the target address to a contain.
Finally, I'll call getModel to get the answer.
and then write it back into the GDB using writeMemory.
In fact, one of its functions is quite simple.
For example, if you want to symbolize ARGB parameters,
you just need to call info.processO
to get the starting position of the stack.
From that starting position,
it will be ARGC, ARGV1, ARGB0, ARGV1,
and then an array like this.
Finally, add a null terminator,
and then your environment variables,
and then add a null terminator.
So you can easily get the position of ARGB0.
When I debug, I will use...
It actually has a function that is not bad.
It can simplify your ASC expression.
Let's look at the first one.
It says bigVector is all,
and bigVector value is 1,
and the value at the back is also 1.
So both are 1's SO,
and it will be 0.
So the next one it simplifies will be bigVector value.
It will be directly 0.
That is to say, I often...
At the beginning, when the program doesn't move,
I will look at each node,
to see which line of index is wrong,
or it didn't run.
It will be quite useful at this time.
Next is a demo.
There will be two demos.
The first example is QuakeHash.
This is an example from another project.
It will pass ARGB1 to the check function.
In the check function,
if its SO is serial,
which is a fixed string,
and if the added value is 0xABCD,
it will print win.
Otherwise, it will print fail.
This is the whole code.
Then we need to reach this point,
which is 0x80484B1.
This is the point we want to reach.
Please help me to show it.
Thank you.
Now we will use GDB
to import this program.
First, we break it in main.
Then we set ARGB1 directly.
Then we use GDB to symbolize.
It is symbolic.
Then we set the location we want to reach,
which is the target.
Then we execute the command gdb.
Then we call treatment.
It will help us to solve the answer.
Finally, it asks if we want to
inject it back to gdb.
Then we write yes.
Then we continue to execute.
It will print win.
Then we print win.
Next, this example is Crayme SO.
It is also used by others.
It is the same.
It will take this from ARGB
and throw it to the check function.
It will put every word as SO 0x55.
If it does not match the SO result
and the fixed character,
it will write 1
and print fail.
Otherwise, it will go to the next loop.
So, we have to pass through each loop
to reach the final return 0.
Print win.
This is the code.
So, we have to find out
in each loop,
it will judge where the command is.
This is the location 08048447.
Okay.
Please help me.
So, the same gdb
will import it in.
Then we break it in main.
Then we also break it in the
setting of the judgment just now.
Then we set that
our ARGB is symbolic.
Then we specify
where our target address is.
Then we start to execute.
Because when we execute,
it is actually good to just pass it randomly.
Anyway, the last symbolic execution
will help us to finish the calculation.
Then when you
inject back to the gdb,
just kill the original calculation.
Now it is slowly
slowly
solving its loop.
Because it is a loop,
you have to solve it one by one.
Then finally,
because we all inject back to the original state,
so when you go to continue,
when you come back from non-payment,
then your value will be correct.
Yes.
Finally, the word win will be printed.
Okay.
In fact,
most of the time when I write,
I will write gdb comment.
I don't know if everyone knows this.
I will tell you.
gdb has a function called gdb comment.
That is,
you save the comment you want to execute.
Then it can directly
be saved in a text file.
It can read it directly
and help you run.
This is very convenient.
Because I am...
Can you help me play it?
Because I am...
When I am testing,
it is very troublesome if I keep losing.
So you can write the comment first.
When you press it,
it will help you run it all.
Yes.
Probably like this.
Like this,
I just have to judge that
it interacts with me.
That is,
it asks me to go back to gdb.
Other instructions are
to read from that file.
I don't have to lose.
Then,
the next demo is
I can combine other gdb plugins.
Like,
I don't know if you have heard of
a gdb plugin.
It is quite famous.
It is called payda.
That is,
it has some very useful gdb functions
so that you can use its comment.
So,
this is the same demo as just now.
But what is different is that
we just used
the method I wrote
to find the position of argb directly.
But in fact,
we can give a paper first.
Then,
after finding this paper in the memory,
we can mark that memory
as symbolic.
Then,
we can execute it.
You can play it for me.
So,
this time,
we won't symbolize argb directly.
The string above is
the thing of payda.
It is quite convenient.
It will help you to
print the state of your CPU
when you don't want to break it.
Then,
you can execute it to
which assembly.
In fact,
we just give five a.
Then,
after that,
we will
find the position
in the memory
in payda
Then,
we know this position.
Then,
we set this 5-byte
to symbolic.
Then,
the other demos
are the same as before.
So,
some conclusions are that
in fact,
we use gdb debugger
to provide some basic information.
So,
we can save a lot of parts
to make it easier
for everyone
to use it.
this tool.
We can use
this tool.
Then,
one of the advantages is that
it is independent.
That is to say,
when you call this tool,
you won't be affected
by the state of execution
of the original gdb.
Unless,
you want to return gdb
indirectly,
you will change
the original answer
to the correct one.
And,
it has an advantage.
It can do
concali execution.
What is
concali execution?
Concali is
a combination word.
It is
concali
plus symbolic execution.
It uses
symbolic variable
and concrete value
at the same time.
Because,
in fact,
this advantage
is that
it is very fast.
Because,
if it is
before,
according to
the previous
semantic,
it will
predict
your
eip
or
rip
according to
But,
if it is
before,
you won't be wrong.
You will be able
to simulate
the real
environment
and
throw it
to it.
So,
you don't need
to worry
that the state
of the real
CPU
will be different.
Because,
in fact,
it is very troublesome
to write
the semantic
interaction.
Because,
you have to guarantee
that
every semantic
of your interaction
is correct.
Okay,
some drawbacks.
Triton's
drawbacks.
In fact,
it doesn't support
GNU C library.
It is
what we usually use,
for example,
scanf,
read,
some function code.
It doesn't support it.
In fact,
why?
Because,
its semantic
actually only supports
some
on this network,
on this link,
there are
its support
指令列表.
If you need
to support
GNU C library,
you must support
interop.
It is a
code system code
指令.
Because,
most of the library
code will use
system code.
So,
you must support
this指令.
Then,
you can support
GNU C library.
Then,
you have to
make
your own
path
choices.
Because,
like the loop
just now,
the loop
that goes
five layers
you have to
go to that loop
and make
path choices.
Now,
let's compare
using
Triton
to compare
other
Symbiotic Execution
framework.
There are two
to compare.
The first one is
Klee.
The other one is
Anger.
Klee is
a
Symbiotic
LLVM Compiler.
This is
its website
and
its GitHub.
In fact,
if you are
interested
in Klee's project,
you can read
this paper.
Actually,
there are
only a few.
But,
most of them
talk about
how Klee
works in
desktop.
And,
what is its
function.
The main
purpose of Klee
is that
it wants
to
meet
the
code coverage
rate
as high as
possible.
That is,
it wants
every place
in the city
to be executed.
And,
it also wants
to detect
which
calculations
are dangerous.
For example,
the number
minus zero.
For example,
the number
minus zero
is dangerous
because it will
block
the whole
city.
Or,
some bugs
are caused
by this.
So,
it is
a
framework.
And,
the main
purpose of
Klee
generate
your
test case.
But,
it must
compile
into
LLVM
B code.
So,
it must
have
a source
code.
When it
executes,
it changes
the input
into
Klee's
function.
And,
it marks
that part
as
symbolic.
And,
it also
input
information.
For example,
we have
int
in
Then,
we call
Klee
make
symbolic.
We make
A
into
symbolic.
Then,
we call
it to
the
gate
side
to
generate
all
routers out
So,
it will generate
all three
routers.
return0,
return-1,
return-1.
And,
this is
how it compiles
in LLVM
IR.
Yep.
So,
this is
the result
of it's
run-through.
Let's say
we will have
three test
cases.
The first one
is 0.
The next
one
is a negative number, because it's the beginning of 0.1, it's a negative number.
The third one is 0x80,
which is a positive number.
It will produce three inputs,
and it will go to three different cases.
Let's take a look at how the technology is actually executed.
It will run down first,
until it encounters a branch.
So let's take a look at it first.
It encounters a branch, for example,
it encounters x equals 0.
So it will first judge,
is this condition, x equals 0,
all concrete?
If concrete means,
for example, if 1 or if 0,
if constant, it will be concrete,
then it will be able to solve it directly.
After solving it, it won't have to
record the current condition.
If it realizes it still can't solve it,
then it will record it first,
and then it will record the current condition.
Then it will take the state and copy a part of it.
Then it will take it as a fork,
and then it will execute the following path.
Then it will keep running down,
until it encounters x equals 0,
or when it encounters x equals 0,
it will stop.
So when it stops,
it will start to explain,
after I get to this path,
what input do I need to get to this path?
What input do I need to get to this path?
Then it will solve it.
Then it will solve it.
Then it will solve the answer.
Okay, then this is,
its entire process will keep running like this.
Then,
then,
in fact, it will run to,
that is, there is no,
there is no state that is left.
That is, because in the end,
all states will stop.
If it doesn't leave normally,
otherwise it will make a mistake.
Then it will run to all states that are gone.
Otherwise, you set a timeout yourself.
If it reaches that timeout,
then it will stop.
Okay, then let's take a look at how it's different from Curly.
So, in fact, Curly has introduced the concept that you have a state.
That is,
your,
your deeper state can go through your clone state and then reach it.
Then,
it looks like,
it's like a compile process,
from C code to a compile process.
It looks like,
it's like a compile process from C code to a compile process.
Then, we can take a look at it.
That is,
now,
for example,
our state is now.
Now,
this is the paste part.
Later, I will talk about the C library part.
Then,
our state is now.
Then, in the end,
we have to reach D,
this state.
Then, for example,
in Twitter,
we will first go to the output of B,
this state input.
Then, we will set the concrete memory,
the concrete value,
the concrete value,
and then we will go to the content of Twitter.
Then, we will,
so Twitter has now reached B.
Then, we will go to solve D.
We will go to solve this path of D.
If it is clear,
it will actually run in parallel.
That is to say,
I am now making trouble with this.
It meets the branch.
So, it will clone state.
One runs A, one runs B.
Then, it meets the branch of B.
Then, it will run C, one runs D.
Then, in the end, we hope that we can reach this path of D.
Then, it will help us to solve the path of D.
What input can reach this path of D?
If we need to use the GNU C library,
we have a parameter that says
.libc is equal to uclibc.
uclibc is a smaller libc.
Then, others write it down.
Usually, it is used in the embedded system.
Then, there is postfile.
Postfile.
Postfile runtime parameter.
That is to say, you must provide some postfile runtime environment to it.
But, in fact, when it detects it,
for example, you are doing function call,
for example, you call scanf,
or call atoi,
that is to say, turn it into int.
In fact, it will directly link to,
it will directly link uclibc to your program.
It will execute it concretely.
The meaning of executing concretely is that
you originally have the part of makeSymbody,
but in fact, when it is doing function call,
because it is not programming LLVNIR,
so it can only execute concretely.
That is to say, your Symbody information
will be lost when you call the library call.
So, in fact, it can't handle the problem of
GNU C library very well.
Next, we are going to introduce ANGER.
ANGER is actually more famous.
You may have heard of it before.
This is ANGER.
It is more special because
it can support a lot of different directive structures.
For example, there are ARM, x86, x8664,
and a lot of MIFs,
or other different computational structures.
Usually, we will load the library first,
then turn it into IR,
and then execute analysis.
So, the whole process is like this.
That is to say, I import ANGER first,
and then put this binary,
which is AIS-3-crack-me,
and then initialize it.
Because this example is going to eat ARGB,
so I set ARGB to a 100-byte big beta.
Next, I initialize its original state.
Its original state is that
I will write crack-me 1,
and then add ARGB to it.
This is its initial state.
Finally, it will initialize its...
This is called Simulation Manager.
That is to say,
it will control its state.
Next,
actually, like this export function,
it is actually setting the content.
That is to say, it translates it as 0x400602.
It is actually setting symbolic content.
That is to say, my EIP must be equal to this word.
Finally,
after we start running and executing,
because it might have a lot of results,
so we will take the 0x of this state,
and that will be our answer.
Yes.
Okay, so this is the example just now,
but it seems to be over.
Yes.
So 0x400602 is the location we are going to arrive at.
This location,
it will print out the correct,
which is this string,
so we are going to arrive at this location.
This is its program code,
which is the string we just added.
This run,
it will probably wait for 2-3 seconds.
Because its state has to keep going down,
it needs some time.
Finally, it finds a suitable input,
then it will print it out.
What is IR?
That is to say,
because actually,
when you want to write something to support a different computer structure,
you need to use IR.
It is the middle expression.
Because every computer structure is processing its register,
or the memory method is different,
so you need to reduce these different effects.
Then you put your main symbolic on IR.
In this case,
you don't need to
write different things for different structures.
The IR expression of ANGLE is VASE,
which is the one Valkyrie uses.
In fact, if you go to...
Actually, I am more curious about it.
I like to see why it uses this.
It is because you can turn from binary code to VASE,
which is a very good support.
That is to say, there are a lot of supports.
So it uses the IR of VASE.
Usually,
IR needs to abstract some things.
That is to say,
the first one is the ratio of NAND.
That is to say,
for example, if you use ARN or something,
compared to TABEL,
it will have a different ratio of NAND,
like R9, R16, and so on.
It will have a different ratio of NAND.
It must be abstracted
so that it can use the same IR expression.
Or memory access.
Because some...
For example, I will talk about x86.
I am familiar with this.
When x86 stores some memories,
it will store them with register.
That is to say, it will register
and add some offset to store its memory.
So it is very different from other computational structures.
So it uses the same storage method as much as possible.
So you won't use EAS.
Or memory segment.
For example, have you seen state canary?
State canary is a
method to prevent bubble overflow.
The canary exists in the segment,
which is in the GS.
In the GS segment.
It will have different registers
to use its segment.
The last one is the side effect.
Because, for example,
if there is a stake architecture,
it will have these commands, push or pop.
But actually, the command push or pop
is destructive.
Because, for example,
if you push something
on the stake,
the paper on the stake
will be covered by you.
This is destructive.
Because it has side effect.
So we have to reduce these effects as much as possible.
For example,
add L, EAS, EBS.
It will turn into IR.
It will be like the situation on the right.
The left side is x86 code.
The right side is IR.
We will take it out from EAS first.
Then the zero is offset.
That is to say,
in fact, in register,
as mentioned earlier,
it is offset.
The zero is offset.
The 12 in the second line is also offset.
In the second line,
we take out the value of EBS.
Next, T1 is to say
I put two 32-bit integers together.
Then PUT is also offset.
That is to say,
we will store it back in EAS.
Because, in fact,
you can see the value like this.
This is x86.
But you may use x64 or something.
It will also...
It will be similar.
After it turns into IR,
it will be a similar result.
Let you have two different computer structures.
It will not go wrong when it turns out.
This is the state type in ANGLE.
We just need to look at the result.
We just need to look at the active.
Because the active is to say
it is still executing now.
Next, there is a found.
The found is to say
when I set the counter,
it finds the counter that fits.
Then it will put this state into the found.
Then we can actually go to see
what its input is in the found.
What is different about ANGLE?
That is to say,
the concept of its state is more complete.
That is to say,
it has to classify different states.
Then we can also go...
I haven't talked about this example.
That is to say,
in fact,
we can also do some processing for that state.
That is to say,
we can go to that state
and then go to see some things.
Then there is another most important point,
which is symbolic function.
In fact,
ANGLE can handle symbolic functions.
That is to say,
it can handle the Gnostic library.
Then,
it will call it
the SYN procedure.
This is the term it uses.
That is to say,
we put its original...
For example,
we call it scanf.
Then we change it to SYN procedure.
SYN procedure is a library hook written in Python.
That is to say,
you call scanf,
then it will jump to the Python code I wrote.
In fact,
if you really want to simulate the Gnostic library,
it is very troublesome.
Because the whole system is very complicated.
So it actually uses a fake hook.
That is to say,
it uses the method written in Python
to replace your Gnostic library code.
These are some...
So you may have a little bit...
You may have an incorrect result
or the paste will explode.
If you encounter this situation,
you can disable it.
Or you can use what you wrote
or fix it.
Let's look at a simple example.
That is to say,
how to handle scanf.
This source code,
you can see it on the top.
That is to say,
if you are interested,
you can go and see how it is done in the library.
The first thing we need to do is to get the first argument.
Because the first argument of scanf is formatString.
Next, we define the returnType of your function.
Because it supports a lot of different architectures,
it must first determine its returnType.
Next, it parses the formatString.
Then, according to the formatString content,
it reads the input from the input.
Because scanf reads from the input.
Next,
it reads the input from the input.
Next,
these three classes,
I hope you can memorize them.
Because there won't be these things in the next page.
Then, when you memorize them,
you will understand them better in the next page.
The first one is Procedure.
Scene Procedure.
It has a message called typePTR.
That is to say,
you see it eats the ARCH parameters.
So, it decides according to your architecture
what your returnType should be.
The second one is FormatParser.
It is derived from Scene Procedure.
It has a message called underscoreParser.
The first one it eats is the index of formatString.
So, for example, scanf will eat the first one.
Then, after parsing,
the interpreter will say that
according to the content of the parser,
I will put it in the back.
Then, the interpreter will say that
I will put it in the back variable of scanf.
So, I will read the data from the address parameter
and then put it in the ARGS.
This is usually a array.
The array is in the back of the ARGS.
Then, it starts from the number of arguments.
That is the argument starting from the number of arguments.
This is the whole code.
That is to say,
I first decide its argumentType,
and its returnType.
Then, I pass it.
After I pass it,
I get the file from FD0.
That is, I take it from FD0,
which is the standard input.
Then, I decide its region.
In fact, this region can be given or not.
Then, start means
its file descriptor
reads from the position of the memory.
Because in Angular,
whether it is FD or something else,
other ratios,
it uses
it as a flat memory to read.
So, each one has its memory address.
Then, after reading,
we interpret it.
Then, we throw the formatString
from the input
to the variable at the back.
Finally, we do the read action.
This way,
you may not be clear.
So, I use another function
to compare.
I just talked about scanF.
Now, I use scanSscanF
to compare.
scanSscanF is a function.
Its input is not read from the standard input.
It is read directly from the variable.
So, for example,
when I scanF pass on the left,
formatString is the first one.
So, it is 0.
When I scanF pass on the right,
formatString is the second one.
So, it is 1.
Then, scanF is read from FD0.
It is read from the standard input.
So, its get file is 0.
On the right,
it is read from the standard input.
In fact, what it gives is
saved ARG0.
That is, I read directly from the sentence.
So, on the left,
I read from the start memory
to the first one.
That is,
I start from the first one.
That is,
I insert it from I.
That is, the first one of ARG.
I insert it from I.
On the back,
it starts from
the start memory
to the second one.
The first one is sentence.
The second one is formatString.
It starts from the second one, str and I.
Then, I insert it from the second one.
In fact,
this is some reference.
This talk is over.
